Skip to content

feat: implement CD*/CG* methyl wildcard labeling#8

Draft
tsenoner wants to merge 11 commits intodevelopfrom
feat/methyl-wildcard-convention
Draft

feat: implement CD*/CG* methyl wildcard labeling#8
tsenoner wants to merge 11 commits intodevelopfrom
feat/methyl-wildcard-convention

Conversation

@tsenoner
Copy link
Copy Markdown
Collaborator

@tsenoner tsenoner commented Mar 5, 2026

Summary

  • Adds stereospecific methyl assignment detection for Leu (CD1/CD2) and Val (CG1/CG2) residues
  • When stereospecific assignment is not confirmed via metadata, methyl atoms are relabeled with CD*/CG* wildcards
  • Controlled by --include-methyl-shifts CLI flag (default off) — no impact on existing pipeline behavior

Changes

  • trizod/constants.py: Add METHYL_ATOMS constant
  • trizod/bmrb/bmrb.py: Add _detect_stereospecific_methyls() method, get_methyl_shifts() function
  • trizod/trizod.py: Wire --include-methyl-shifts through CLI → fill_row_data → output_dataset

Test plan

  • Process entry with known stereospecific assignments (e.g., BMRB 18414) → should preserve CD1/CD2 labels
  • Process entry without stereospecific annotation → should use CD*/CG* wildcards
  • Verify --include-methyl-shifts flag adds methyl data to JSON output
  • Verify default behavior (flag off) produces identical output to before

🤖 Generated with Claude Code

tsenoner and others added 11 commits November 28, 2025 14:48
- Remove duplicate entries (*.txt was listed twice)
- Organize into logical sections with comments
- Add exception for trizod/potenci/data/ directory
- Use proper gitignore patterns (directories with trailing /)
- POTENCI data files are now included as required dependencies
- Add 6 CSV tables extracted from inline strings
- Add comprehensive README documenting data format
- Data sourced from Nielsen & Mulder (2018) POTENCI algorithm

Files added:
- tablecent.csv: Central residue chemical shifts
- tablenei.csv: Neighbor residue corrections
- tabletermcorrs.csv: Terminal corrections
- tabletempk.csv: Temperature coefficients
- tablecombdevs.csv: Combinatorial deviations
- tablephshifts.csv: pH-dependent shifts
- README.md: Comprehensive documentation
- Add comprehensive type hints (ShiftDict, CorrectionDict, etc.)
- Replace unsafe eval() calls with safe float conversion
- Implement CSV-based data loading with caching
- Add PhysicalConstants dataclass
- Remove all backward compatibility wrappers
- Update module docstring with academic references

Security: Eliminates eval() vulnerability
Performance: Cached data loading with lru_cache
Maintainability: Type-safe, well-documented API
- Add comprehensive module docstring
- Fix typo: logging.waring() to logging.warning()
- Update outdated comments (python2.x to python3.10+)
- Replace ##-style comments with proper documentation
- Update to use new constants API (PHYSICAL_CONSTANTS, load_* functions)
- Improve CLI documentation in main()
- Export modern API: load_central_shifts, PHYSICAL_CONSTANTS, etc.
- Remove legacy exports: R, a, b, cutoff, e, ncycles
- Add module docstring
- Update __all__ for clean public API
- Remove setup.py (replaced by pyproject.toml)
- Add uv.lock for reproducible dependencies
- Configure hatchling to include potenci/data files
- Update build system to use modern Python packaging standards

Migration: setup.py → pyproject.toml + uv
Build backend: hatchling
Lock file: uv.lock for reproducibility
- Run ruff check --fix --unsafe-fixes on all modules
- Apply ruff format for consistent code style
- Fix import ordering, comparison operators, nested if statements
- Remove unused variables and imports
- Add explicit exception handling (no bare excepts)
- Rename functions to follow snake_case convention:
  - get_pH → get_ph
  - convChi2CDF → conv_chi2_cdf
  - get_offset_corrected_wSCS → get_offset_corrected_wscs
- Rename exceptions to follow Error suffix convention:
  - Found → FoundError
  - OffsetTooLargeException → OffsetTooLargeError
  - OffsetCausedFilterException → OffsetCausedFilterError
  - FilterException → FilterError
- Fix lambda loop variable binding issue
- Add exception chaining with "from e"
…equires-python

- Replace deprecated np.float with np.float64 (removed in NumPy 1.24+)
- Replace eval() with float() in read_csv_pkaoutput (security fix)
- Fix argument count mismatch in getpredshifts_arr -> getphcorrs_arr call
- Remove dead code: unused log_fun(), no-op str(i+1) statements
- Bump requires-python from >=3.8 to >=3.9 (BooleanOptionalAction, dict |)
- Exclude test/ from ruff config (pre-existing issues, not part of package)
- Add implementation plan document for BMRB expert suggestions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace blanket *.csv/*.txt/*.zip patterns with specific directory
rules for clarity and transparency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add stereospecific methyl assignment detection for Leu (CD1/CD2) and
Val (CG1/CG2). When stereospecific assignment is not confirmed, methyl
atoms are labeled with CD*/CG* wildcards to aid downstream automatic
resonance assignment protocols.

- Add METHYL_ATOMS constant with stereo/wildcard definitions
- Add _detect_stereospecific_methyls() to BmrbEntry via metadata scan
- Add get_methyl_shifts() function for methyl shift extraction
- Add --include-methyl-shifts CLI flag (default off)
- Wire methyl shifts through fill_row_data → output_dataset pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tsenoner tsenoner marked this pull request as draft March 10, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant